MyAnimeList Dataset Exploration
Anime is a style of Japanese film and television animation, typically aimed at adults as well as children. More and more people have come out as anime fans from celebrities, politicians, sports personalities among many others enhancing the outlook of the Otaku (diehard anime fans) as that of "cool", so to speak.
The anime industry was valued at over USD 24 Billion in 2020 projected to reach USD 43 Billion by 2027 with a growth rate of 8.80% over the forecast period. More on this can be read here.
Naturally, the term anime is attributed to Japan but other countries like China and South Korea have also ventured into the production of anime. This study, however, will focus only on Japanese anime productions, the studios that have produced these anime and a sample of viewers' sentiments towards these anime.
The end goal is to establish a baseline regarding the optimum niches each studio should focus on in order to ensure their releases are successful.
# import all packages and set plots to be embedded inline
from wordcloud import WordCloud, STOPWORDS
import pandas as pd
import numpy as np
import re
#ensure all Plotly plots render while offline
import matplotlib.pyplot as plt
import seaborn as sb
from plotly.offline import iplot, init_notebook_mode
import plotly.graph_objects as go
import plotly.express as px
init_notebook_mode(connected=True)
%matplotlib inline
This study entails two related datasets from MyAnimeList (referred to as MAL in this document). The initial dataset called
animes.csvcontains over 7000 records of anime productions made between 1992 and 2018 with over 10 features containing metadata for each anime. The second dataset namedreviews_cleaned.csvincludes viewer review data obtained from MyAnimeList with over 120,000 records with over 10 features entailing the viewers' comments, ratings and attitude regarding each anime.These two datasets can be linked on the
titlefeature. Here's a snippet of each dataset. Anime Dataset | Viewer Reviews Dataset :-------------------------:|:-------------------------:|
This project focuses on actualizing and accentuating various data visualization techniques on the MAL datasets as part of Udacity's Data Analysis curriculum.
The main features that I will be investigating are the ranking scores provided by
reviews_cleanedfrom the Overall score through to the Character Development score and how they are affected by various features obtained from theanimes_cleaneddataset.
From the
animes_cleaneddataset,genre,rating,season_released,source,studio, andtypewill be fundamental in categorizing each record and figuring out how they affect the overall scores either provided by MAL or by each viewer review. These will be used to determine how the genre, rating, timing of release, source material, studio and release format affect each production's ranking.
Loading the datasets¶
#custom function to read data into Pandas DataFrame
def open_set(csv, sep=',', encoding='utf-8', usecols=None):
df = pd.read_csv('data/'+csv, low_memory=False, encoding=encoding)
return df
df_animes = open_set('animes_cleaned.csv')
df_reviews = open_set('reviews_cleaned.csv', encoding='latin')
# This custom function is used to display frequency where fields are only in text format using the describe function
def describe_pretty(x):
if type(x) == float:
x = f'{x:.1f}'
elif type(x) == str:
x = x
return x
for frame in [df_animes, df_reviews.iloc[:,0:11]]:
display(frame.head(), frame.info(), frame.describe().applymap(lambda x: describe_pretty(x)))
From
df_animes, the mean number of episodes per title is 12.2. This aligns with the norm in the anime industry where each cour/season usually consists of about 12 or 13 episodes.From
df_reviews, theStoryscore stands out as the lowest. This could attribute to the fact that the narrative of each title is probably the most important factor while engaging with the show.From
df_reviews, theAnimationscore stands out as the highest. This could be due to technological advancements as the years progressed from the early 1990s into the late 2010s.
Univariate Explorations¶
>
- First, I would like to investigate how production of titles has progressed through the years (1992 - 2018). Has there been a significant dip/upshoot or rather a steady decline/rise?
fig = px.line(df_animes.groupby(['year_released'])['title'].count().reset_index(),
x='year_released', y='title',
title='Yearly Anime Productions (1992 - 2018)',
labels={'title':'Number of anime productions', 'year_released':'Year'}, orientation='v')
fig.update_layout(barmode='group', yaxis={'categoryorder':'total descending'})
fig.show()
Compared to the 90s, the anime industry has seen a progressive increase in titles produced from the 2000s through to the 2010s. This attests to the growing popularity in anime as a form of media.
The decline noticed between 2017 and 2018 is due to the fact that not all the titles produced in 2018 were included in the dataset as it was produced.
I'd like to take a look at the major players in the anime industry by checking out the Top 10 anime studios based on the frequency of titles produced.
Note that some titles were developed by a collaboration of studios and so these collaborations will be attributed as one entitiy instead of separate studios.
# I tally the appearance of each studio per record as an account of their involvement in the production of the record's title.
fig = px.bar(df_animes.studio.value_counts().reset_index().rename(columns={'studio':'count', 'index':'studio'}).head(10),
y='studio', x='count', text='studio', orientation='h',
labels={'studio':'Anime Studios','count':'Number of anime produced'},
title='Top 10 Studios Producing The Most Anime Between 1992-2018')
fig.update_yaxes(visible=False, showticklabels=False)
fig.update_layout(yaxis=dict(autorange="reversed"))
- Toei Animation and Sunrise lead the pack by a distance in terms of the titles they have churned out. This could speak to the success they have had in many of their releases: Toei with world renowed One Piece anime, Sunrise with the Cowboy Bebop series, the Gundam franchise to name a few.
Anime releases are usually classified based on the time of the year they are released i.e. Winter Anime (released between Nov and Jan), Spring Anime (Feb and April), Summer Anime (May and July) and Fall Anime (Aug and October).
Could there be a inherrent reason as to why certain titles are successful and could this be attributed to their seasonal release period? This is one question I will be looking to answer through various imputations linked to the
Vierwers' Reviewsdataset.
fig = px.bar(df_animes['season_released'].value_counts().reset_index().rename(columns={'index':'season_released' ,'season_released':'count'}),
x='season_released', y='count', text='count', title='Anime Released Per Season (1992-2018)',
labels={'season_released':'Season Released','count':'Number of anime releases'},
color_discrete_sequence=['black'])
fig.update_layout(barmode='group', yaxis={'categoryorder':'total descending'})
fig.update_yaxes(visible=False, showticklabels=False)
fig.show()
- Spring releases top the charts in terms of volume whereas Winter releases are the fewest of the lot.
fig = px.bar(df_animes['type'].value_counts().reset_index().rename(columns={'index':'type' ,'type':'count'}),
x='type', y='count', text='count', title='Preferred Release Formats for Anime Productions (1992-2018)',
labels={'type':'Release Format','count':'Number of anime releases'},
color_discrete_sequence=['black'])
fig.update_layout(barmode='group', yaxis={'categoryorder':'total descending'})
fig.update_yaxes(visible=False, showticklabels=False)
fig.show()
As seen, TV releases are the most preferred method among the various release formats. This could be due to ease of accessibility for most viewers and pre-planned programme schedules with stations like NHK, MBS, Toonami etc
Original Net Animation (ONA) format is a relatively new outlet for animation distribution that has been made viable by the increasing number of streaming media websites in Japan hence it's low numbers.
fig = px.bar(df_animes['source'].value_counts().reset_index().rename(columns={'index':'source' ,'source':'count'}),
x='source', y='count', text='count', title='Anime Releases Per Source Material (1992-2018)',
labels={'source':'Source Material','count':'Number of anime releases'},
color_discrete_sequence=['black'])
fig.update_layout(barmode='group', yaxis={'categoryorder':'total descending'})
fig.update_yaxes(visible=False, showticklabels=False)
fig.show()
As expected, the most preferred source material for anime is manga, Japanese comics or graphic novels. As explained in this Quora post, "there's more depth in manga compared to original works".
A lot of anime originals have also been spurned in the time period notably movies like Kimi na no Wa, majority of Studio Ghibli movies etc.
fig = px.bar(df_animes['rating'].value_counts().reset_index().rename(columns={'index':'rating' ,'rating':'count'}),
x='rating', y='count', text='count', title='Anime Releases Per Viewer Rating (1992-2018)',
labels={'rating':'Rating','count':'Number of anime releases'},
color_discrete_sequence=['black'])
fig.update_layout(barmode='group', yaxis={'categoryorder':'total descending'})
fig.update_yaxes(visible=False, showticklabels=False)
fig.show()
The volume of PG-13 rated anime indicates the growing trend in the production of content with stronger language, extended violence or sexual situations and drug-use. This article by the LA Times in 1997 clearly depicts how the people are more likely to prefer stronger themes in content at much earlier ages.
There are very few R-rated anime titles indicating the anime demographic is intended for Teenagers to Young Adults.
What factors heavily influence successful anime from the late 20th century into the early 21st century?¶
Note: Taking into consideration that some anime (if not all anime) have been produced by a collaboration of major studios and minor studios, records of such collaborations will be tabulated as one entity and attribute each studio's work as their own.
First, let's take a look at the what have been ranked as the top 100 anime of based on the considered timeframe and investigate the frequency of certain factors from the sample size.
# create a dataframe of animes with the top 100 smallest avg_rank
top100_anime = df_animes.nsmallest(100,'avg_rank_score')
top100_anime
factors = [ 'studio', 'genre', 'rating','type', 'source', 'season_released','year_released']
def value_counter(df):
for factor in factors:
display(df[factor].value_counts().to_frame().head(10))
value_counter(top100_anime)
Studio Bones, Madhouse, A-1 Pictures, Kyoto Animation (KyoAni ♥) and J.C. Staff have had the most number of top releases in the last 3 decades compared to the rest of the pack.
The Action genre is by a mile the most popular genre among titles in the anime industry.
TV releases have been by far the popular medium of release in the anime industry.
Manga and Original sourced anime are the lead runners in terms of adaptations of titles in the industry.
Fall and Spring anime seem to be the most popular among viewers based on the MAL anime dataset.
Despite their scarcity in numbers compared to anime rated PG-13, titles rated R-17+ seem to quite popular in the anime industry. This could indicate a need for anime with more adult-related themes.
Bivariate Exploration¶
In this section, investigate relationships between pairs of variables in your data. Make sure the variables that you cover here have been introduced in some fashion in the previous section (univariate exploration).
# aggregating the data by grouping it by the release format and source material
df_agg = df_animes.groupby(['type', 'source'])['title'].count().to_frame()
df_agg = df_agg['title'].groupby('type', group_keys=False).nlargest(5).reset_index()
df_agg.type = df_agg.type.astype('category')
fig = px.bar(df_agg,
x='source', y='title', color='type',
title='Anime Adaptations by Release Format and Source Material (1992-2018)',
labels={'type':'Release Format', 'source':'Source Material', 'title': 'Total Adaptations'})
fig.update_layout(barmode='group', yaxis={'categoryorder':'total descending'})
fig.show()
A variety of manga are majorly adapated into TV releases or OVAs (Original Video Animation)
Movie anime releases are more likely to be originals than works adapted from manga or other media formats.
OVAs are the main adaptation outlet of visual novels. This is because most visual novels are are just hard to adapt. Most of the dialogue in any visual novel is internal and wouldn't translate very well into a visual medium like TV anime since they can range anywhere from ten hours to complete.
df_animes.dtypes
cols_to_exclude = ['title', 'episodes', 'rank', 'image_url',
'score', 'score_rank', 'popularity', 'popularity_rank', 'members','favorites', 'avg_rank_score' ]
def categorizer(df, col_to_order = None, order=False):
for col in df.columns:
if df[col].nunique() < 600 and col not in cols_to_exclude:
df[col] = df[col].astype('category')
if order == True:
df[col_to_order] = df[col_to_order].cat.reorder_categories(df[col_to_order].unique(),True)
return df
df_animes = categorizer(df_animes, 'year_released', True)
df_animes.info()
df_agg = df_animes[['studio', 'genre']].value_counts().reset_index().rename(columns={0:'count'}).head(50)
df_agg.info()
fig = px.bar(df_agg, x='studio', y='count', color='genre', labels={'count':'Number of titles produced', 'genre':'Genre', 'studio':'Studio'},
title='Distribution of Titles and Genres Produced by Various Studios')
fig.show()
There is a clear indicator that a lot of studios do produce many Action, Adventure and Comedy titles compared to other genres.
Kyoto Animation's model clearly revolves around Slice of Life and Comedy anime depicting their success with anime such as Violet Evergarden, Miss Kobayashi etc
df_agg = df_animes.groupby(['year_released', 'source', 'type'])['title'].count().to_frame()
df_agg = df_agg['title'].groupby('year_released', group_keys=False).nlargest(5).reset_index()
df_agg.year_released = df_agg.year_released.astype('category')
df_agg.source = df_agg.source.astype('category')
fig = px.line(df_agg,
x='year_released', y='title', color='source',
title='Anime Adaptation Sources (1992-2018)',
labels={'title':'Total adaptations made', 'year_released':'Year Released', 'source':'Source(s)'}, symbol='type',
height=1000)
fig.update_layout(barmode='group', yaxis={'categoryorder':'total descending'})
fig.show()
Manga adaptations have been on the rise since the 90s. They are predominantly the most popular source material for anime adaptations.
Original adaptations have also been on the rise seeing as there was a steep increase in their number between 2013 and 2015 consquently toppling Manga adaptations in the latter year. This could be attributed to the critical acclaim and success in 2013 releases such as The Tale of Princess Kaguya by Studio Ghibli, The Garden of Words by renowed director Makoto Shinkai and CoMix Wave Film, 2014 releases such as Evangelion: Final, The Last Naruto Movie, Detective Conan: Dimensional Sniper and 2015 releases such as Tafuu no Norada, Psycho-Pass: The Movie and many others. It's worth noting a number of these titles were original-movie productions.
Suprisingly, there have been few manga-movie adaptations in the past three decades. This could be due to the fact that while many popular movies are original creative works intended to convey their thematic messages in a sitting, it's much easier to condense manga into TV adaptations spanning over a period of times.
Adaptations sourced from Light Novels and Visual Novels have struggled to hit the highs of manga-sourced and original titles.
df_stds = df_animes.copy()
df_stds[['season_released', 'year_released']] = df_stds[['season_released', 'year_released']].astype('str')
studios_avgs = df_stds.groupby('studio').mean().reset_index()
display(studios_avgs, studios_avgs.describe())
On average, most studios releases constitute 12 episodes per cour/season. This is the widely accepted notion in the anime community.
With a mean 6.43, it's true to say that based on MAL scores, viewers have been content with the anime titles churned out from the early 90s till 2018.
studios_avgs.query('avg_rank_score == avg_rank_score.min()')
Studio Gainax's and Tatsunoko Productions's work is highly rated on average amongst the anime community with this being world-renowed Neon Genesis Evangelion.
df_animes.query('title == "Neon Genesis Evangelion"').genre
- This was an original TV anime released in the Fall. It's main genre was Action and had a rating of PG-13.
avg_rank_score in the last 3 decades?¶studios_avgs.nsmallest(20,'avg_rank_score')
Notably, a few studios make more than one entry in the highly-ranked sample size we've obtained. They are:
Madhouse, responsible for a lot of popular anime like Death Note, Hunter x Hunter, ACCA-13, the first season of One Punch Man
Studio Gainax, responsible for what is probably the most widely known sci-fi anime in Neon Genesis Evangelion, FLCL, Gurren Lagann among other popular shows.
Studio Satelight, known for assistance in several anime productions, has brought to life shows like Fairy Tail (a popular anime in the Shounen spectrum), Log Horizon.
fav_studios = (list(studios_avgs.nlargest(100,'favorites').studio))
fstudios = []
for item in fav_studios:
studio = item.split(',')
fstudios.append(studio)
from itertools import chain
my_unnested_list = set(chain(*fstudios))
print(my_unnested_list)
Notable studios in this list based on current trends include MAPPA, A-1 Pictures, Studio Khara, White Fox, Madhouse, Studio Deen, Sunrise, Brain's Base, J.C. Staff, CoMix Wave Films, Studio NUT, Bones, Studio Trigger, Ufotable, Egg Firm, Wit Studio, Lerche, Kinema Citrus, Silver Link, Production I.G, Kyoto Animation, Studio Pierrot, Gainax, Cloverworks, Studio Ghibli, Shaft, TROYCA, and White Fox.
My intention is to do an in-depth study of these studios with updated figures from the last 5 years esp how trends might have shifted through the COVID era
seasons = df_animes.groupby('season_released')[['score', 'favorites', 'members', 'avg_rank_score']].mean()
display(seasons, seasons.members.nlargest(10), seasons.favorites.nlargest(10), seasons.score.nlargest(10), seasons.avg_rank_score.nsmallest(10))
Undisputedly, Fall anime have been the most succesful releases among the anime community the past in 20 years. This speaks volumes as to why certain fan-favorite reboots and long-awaited releases like Bleach, Chainsaw Man, Jujutsu Kaisen among others are set for release in the Fall period.
This could also attribute to changes in the 2-cour system where anime is released in two parts through seperate seasons. There could be an emergence of anime being released through Summer into the Fall periods causing a boost in Summer releases. One anime using this format in 2022 is Spy x Family.
df_agg = df_animes.groupby(['year_released', 'genre'])['title'].count().to_frame()
df_agg = df_agg['title'].groupby('year_released', group_keys=False).nlargest(5).reset_index()
df_agg.year_released = df_agg.year_released.astype('category')
display(df_agg.head(20), df_agg.tail(20))
fig = px.line(df_agg,
x='year_released', y='title', color='genre', text='title',
title='Anime Productions by Genre (Top 20)',
labels={'genre':'Genre', 'title':'Number of anime', 'studio':'Anime Studios', 'year_released':'Year Released'},
color_discrete_sequence=['rgb(127, 60, 141)', 'rgb(17, 165, 121)', 'rgb(57, 105, 172)',
'rgb(242, 183, 1)', 'rgb(231, 63, 116)', 'rgb(128, 186, 90)', 'rgb(230, 131, 16)', 'rgb(0, 134, 149)',
'rgb(207, 28, 144)', 'rgb(249, 123, 114)', 'rgb(165, 170, 153)', 'rgb(0,34,86,40)'],
orientation='v')
fig.update_layout(barmode='group', yaxis={'categoryorder':'total descending'}, width=1200, height=1200,
legend=dict(
x=0,
y=1,
traceorder="reversed",
title_font_family="Droid Sans Mono",
font=dict(
family="Droid Sans Mono",
size=12,
color="black"
),
bgcolor="LightSteelBlue",
bordercolor="Black",
borderwidth=2
),
xaxis=dict(tickmode='array'),
margin=dict(pad=10),
bargap=0.1)
fig.update_traces(textposition="bottom right")
fig.show()
Despite being relatively young, the Slice of Life anime has steadily grown into a popular genre since the 90s with a major upshoot in the early 2010s. Some of KyoAni's early work such as Nichijou, Hyouka, Clannad and their successes attribute to this rise.
The Action and Comedy genres are heavily infused in most anime titles.
Surprisngly, there number of Adventure-themed anime titles has seemingly stablized over the years and not experienced a boom like the Comedy genre.
df_agg = df_animes.groupby(['year_released', 'season_released', 'rating'])['title'].count().reset_index()
df_agg.sort_values('title', ascending=False, inplace=True)
fig = px.scatter(df_agg.head(200),
x='year_released', y='title', color='season_released', labels={'title':'Number of anime releases',
'year_released':'Year of Release', 'season_released':'Season'},
title="Seasonal Releases and Ratings Through the 90s to 2018",
color_discrete_sequence=["red", "green", "blue", "goldenrod", "magenta"],
symbol='rating', size=(df_agg['title'].head(200))/10)
fig.update_layout(barmode='stack', yaxis={'categoryorder':'total descending'}, height=700)
fig.show()
Fall and Spring releases have seen high volumes of production in the 2010s. This could be attributed to a number of factors like technological advancement, boom in the anime movie franchise,
The PG-13 rating has been highly pre-dominant since the 90s through various seasons.
Meanwhile, more senstive anime with adult themes (R-rated content) picked up from 2004 gradually increasing in volume from 2008 with a majority of them released in the Winter period.
reviews = []
for val in df_reviews.text:
reviews.append(val)
reviews[0]
comment_words= ''
stopwords = set(STOPWORDS)
for val in reviews:
# split the value
tokens = val.split()
# Converts each token into lowercase
for i in range(len(tokens)):
tokens[i] = tokens[i].lower()
comment_words += " ".join(tokens)+" "
wordcloud = WordCloud(width = 800, height = 800,
background_color ='white',
stopwords = stopwords,
min_font_size = 10).generate(comment_words)
# plot the WordCloud image
plt.figure(figsize = (8, 8), facecolor = None)
plt.imshow(wordcloud)
plt.axis("off")
plt.tight_layout(pad = 0)
plt.show()
There are a few **insights from a set of words that I think we can derive from this wordcloud:
Words like character, enjoyed, felt, main character, and story gravitate towards strong correlation between Character Development, Enjoyment, and Story.
Music and soundtrack attribute to the importance of the music used in anime productions.
Art and character design are some of the highlights the viewership seeks in Animation.
Due to the frequency of TV content, it's also noteworthy of the word episode being a mainstay point-of-note in the viewership.
fig = px.pie(df_reviews, 'attitude', hole=.2)
fig.update_layout(
title_text="Overall Viewer Sentiment of Anime Titles Produced Between 1992 and 2018",
)
fig.show()
- The outstanding sentiment from the viewership has been a resoundingly Positive one in general. This could be attributed to the "anime boom in the late 90s entering the new millenium".
plt.figure(figsize=(8, 6), dpi=80)
fig = sb.heatmap(df_reviews[['Story', 'Animation', 'Music', 'Character Development', 'Enjoyment', 'polarity', 'subjectivity']].corr(), annot=True, fmt='.2f', cmap='vlag_r', center=0)
plt.title('Correlation Between Ranking Features of Anime')
There are a number of intersting insights to be picked up from the correlation matrix plotted above. We can infer:
Generally, the animation, character development, enjoyment, music and story contribute highly to the final score each anime receives by the user i.e. the Overall score.
The Character Development and Enjoyment factors are the closely related to the Story factor when users make their reviews.
Generally, anime with a high Music score are more likely to have good Animation.
It would right to posit that the more "human" features of anime contribute to how the reviewer watches i.e. in what scope it polarises the viewer. Consequently, the Character Development, Enjoyment and Story are more likely to determine whether the reviewer is left with a negative or positive sentiment at the end of the show.
The goal is to merge both datasets into one and ascertain how each of the categorizing features in the anime dataset scores when it comes to the viewers' sentiments.
df_anime_reviews = pd.merge(df_reviews, df_animes[['title', 'genre', 'studio', 'season_released', 'year_released',
'type', 'source', 'episodes', 'rating']], how='inner', on='title')
df_anime_reviews
title as the unique feature in the merged dataset, I'd like to study what's the percentage distribution of sentiments for all the various titles released in the study period.¶values = [len(set(df_anime_reviews[df_anime_reviews.attitude == "Positive"].title)),
len(set(df_anime_reviews[df_anime_reviews.attitude == "Negative"].title)),
len(set(df_anime_reviews[df_anime_reviews.attitude == "Neutral"].title))]
names = ['Postive sentiments', 'Negative sentiments', 'Neutral sentiments']
fig = px.pie(names=names, values=values, title='Percentage Distribution of Titles Receiving Various Sentiments (1992 - 2018)')
fig.update_layout(legend=dict(
orientation="v",
title_text='Titles that received',
yanchor="bottom",
y=0.5,
xanchor="right",
x=0.85
))
- It's fair to say that the positive outlook on most titles slightly outweighs the volume of anime titles receiving negative criticisms.
Overall score and Polarity based on various anime ratings and genres?¶fig = px.scatter(df_anime_reviews, x="Overall", y="polarity", color="genre", facet_col="rating", height=600, width=3000,
color_discrete_sequence=px.colors.qualitative.Dark24, labels={'genre':'Genre'},
facet_col_wrap=4)
fig.show()
When it comes to anime curated for Children i.e. G and PG-rated content, Adventure seems to the standout genre scoring highly in the Overall and generally Positive reactions.
Action, Comedy and Slice of Life content is highly popular among the PG-13 demographic.
Action, Comedy, and Psychological genres tend to perform well in the adult-rated anime. While some of this content may be curated for older audiences and could score highly on the Overall factor, certain titles can leave a Negative sentiment on the viewership.
While content centered around nudity i.e. Hentai might score highly in the Overall department, there is still a general Negative outlook on this genre.
df_anime_reviews.groupby(['year_released', 'attitude'])['title'].count().reset_index()
g= sb.PairGrid(data = df_anime_reviews.groupby(['year_released', 'attitude'])['title'].count().reset_index(),
x_vars='title', y_vars=['year_released', 'attitude'])
g.map(sb.pointplot)
plt.xlabel('Frequency')
g.fig.set_size_inches(15,15)
- There has been a steady increase in the number of reviews given for anime over the years with the number of Positive sentiments heavily outweighing the other two scopes of feeling.
fig = px.line(df_anime_reviews.groupby(['year_released', 'attitude'])['title'].count().reset_index(),
x='year_released', y='title', color='attitude', color_discrete_sequence=['red', 'blue', 'green'],
labels={'year_released':'Year Released', 'title':'Count', 'attitude':'Attitude'} ,orientation='v')
fig.show()
df_anime_reviews.query('year_released == 2006 and attitude == "Positive"').title.value_counts().head(10)
top2006 = df_anime_reviews.query('year_released == 2006 and attitude == "Positive"').title.value_counts().head(10).index.to_list()
df_animes.query('title in @top2006')
Generally the anime industry has had a steadily increasing Positive outlook since the turn of the 21st Century.
Neutral outlooks haven't had the most traction and are also outweighed by Negative sentiments.
Positive viewership reviews had the biggest bump in 2006. Notably this was the year that popular releases like Madhouse's Death Note and Black Lagoon, Sunrise's Code Geass: Rebellion of Lelouch and Gintama aired.
Majority of these titles were Action titles, sourced from manga, aired via TV and released in Spring. These titles were either R-rated or PG-13 rated.
fig = px.line(df_anime_reviews.groupby(['season_released', 'attitude'])['title'].count().reset_index(),
x='season_released', y='title', color='attitude', color_discrete_sequence=['red', 'blue', 'green'],
orientation='v', labels={'season_released':'Season Released', 'title':'Count', 'attitude':'Attitude'},
title='Sentiment Change over Seasonal Anime Releases')
fig.show()
Spring and Fall releases tend to receive the highest number of Positive reviews compared to the other two seasons.
There's a general drop in the number of Positive reviews during the Winter release period.
df_anime_reviews.year_released = df_anime_reviews.year_released.astype('category')
fig = px.histogram(df_anime_reviews, x='attitude', color='genre', facet_col='year_released',
facet_col_wrap=4, height= 3500, width=1000,
category_orders={"year_released": np.arange(1992, 2019)},
labels={'year_released':'Year Released', 'genre':
'Genres', 'attitude':'Attitude'})
fig.show()
- Action anime has been popular since the 90s while Comedy and Slice of Life genres picked up in the 2000s with 2008 being the peak year for the latter genre.
df_review_seasons = df_anime_reviews.groupby(['season_released', 'attitude'])[['Overall', 'Story', 'Animation', 'Music',
'Character Development', 'Enjoyment', 'polarity', 'subjectivity']].mean()
df_review_seasons
From this subset of data, we can observe certain opinions about the viewership about the seasons.
The viewership looks forward to Fall releases the most given their comparitive Positive sentiment scores.
Seemingly, Summer anime releases have been noted to have the lowest quality of animation.
studio_sentiments = df_anime_reviews.groupby(['studio', 'attitude'])['polarity'].count().reset_index()
studio_sentiments.rename(columns={'polarity':'count'}, inplace=True)
display(studio_sentiments.query('attitude == "Positive"').sort_values('count', ascending=False).head(10), studio_sentiments.query('attitude == "Negative"').sort_values('count', ascending=False).head(10))
Madhouse, Kyoto Animation, A-1 Pictures and Bones have garnered the most critical acclaim in the past three decades.
The overall viewership sentiment is a resoundingly Positive one when it comes to the top studios compared to any the overall Negative sentiment that has a relatively low volume.
df_review_studios = df_anime_reviews.groupby(['studio'])[['Overall', 'Story', 'Animation', 'Music',
'Character Development', 'Enjoyment', 'polarity', 'subjectivity']].mean()
df_review_studios.sort_values('Overall',ascending=False).head(20)
def leader(df, stats=[]):
for stat in stats:
print(f'The best ranked in terms of {stat} is', df.nlargest(1, stat).index[0])
stats = ['Overall', 'Story', 'Animation', 'Music', 'Character Development', 'Enjoyment', 'polarity']
leader(df_review_studios, stats)
fig = px.imshow(df_review_studios[['Story', 'Animation', 'Music', 'Character Development',
'Enjoyment', 'polarity', 'subjectivity']].corr(), color_continuous_scale='icefire', width=1000, height=800, title='Relationship between Features of Anime as Ranked by Audiences', text_auto='.2f')
fig.show()
Character Development and Enjoyment have a strong correlation with the outcome of the Story score. If an anime does not deliever on the two former features, it's highly likely that it will score quite poorly on the Story factor.
Animation and Music go hand in hand to a certain degree. This explains why certain action scenes in anime could go for fast-paced upbeat tunes while emotive scenes would require songs along the melancholic to poignant scale.
The outstanding sentiment left on a viewer could be highly influenced by how the anime scores in the Character Development and Enjoyment feature.
df_review_genres = df_anime_reviews.groupby(['genre'])[['Overall', 'Story', 'Animation', 'Music',
'Character Development', 'Enjoyment', 'polarity', 'subjectivity']].mean()
df_review_genres.sort_values('Overall',ascending=False).head(20)
stats = ['Overall', 'Story', 'Animation', 'Music', 'Character Development','Enjoyment', 'polarity', 'subjectivity']
leader(df_review_genres, stats)
- The Thriller and Supernatural genres stand out as having the highest average ratings for most features reviewed by the viewership.
df_review_type = df_anime_reviews.groupby(['type'])[['Overall', 'Story', 'Animation', 'Music',
'Character Development', 'Enjoyment']].mean()
fig = px.bar(df_review_type.sort_values('Overall',ascending=False), text='value', text_auto='.1f',
title='Feature Scores across Various Anime Release Formats',
labels={'value':'Score', 'type':'Release Format', 'variable':'Features'})
fig.update_layout(barmode='stack')
There is a general preference by the viewership in terms of how Movie releases flesh out the Overall component of each anime title.
Movie and, surprisingly, Music releases tend to have the best-looking Animation according to the viewership. However, the latter scores quite lowly in terms of Character Development due it such videos being 3-5 minute snippets. Personally, I highly reccommend some of Eve's or n-buna/Yorushika's works.
OVAs are the least favoured kind of releases based on the viewership's preferences.
df_review_rating = df_anime_reviews.groupby(['rating'])[['Overall', 'Story', 'Animation', 'Music',
'Character Development', 'Enjoyment']].mean()
fig = px.bar(df_review_rating.sort_values('Overall',ascending=False), text='value', text_auto='.1f',
labels={'rating': 'Rating', 'value':'Score', 'variable':'Scoring Feature'},
title='Scores across Various Features and Genres')
fig.update_layout(barmode='stack')
- It is intersting to note that, according to the viewership, the Overall satisfaction factor decreases the more profane/provocative the content gets.